In-season Prediction of Batting Averages: a Field Test of Empirical Bayes and Bayes Methodologies

نویسنده

  • D. Brown
چکیده

Batting average is one of the principle performance measures for an individual baseball player. It is natural to statistically model this as a binomial-variable proportion, with a given (observed) number of qualifying attempts (called “at-bats”), an observed number of successes (“hits”) distributed according to the binomial distribution, and with a true (but unknown) value of pi that represents the player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications. We look at batting records for each Major League player over the course of a single season (2005). The primary focus is on using only the batting records from an earlier part of the season (e.g., the first 3 months) in order to estimate the batter’s latent ability, pi, and consequently, also to predict their batting-average performance for the remainder of the season. Since we are using a season that has already concluded, we can then validate our estimation performance by comparing the estimated values to the actual values for the remainder of the season. The prediction methods to be investigated are motivated from empirical Bayes and hierarchical Bayes interpretations. A newly proposed nonparametric empirical Bayes procedure performs particularly well in the basic analysis of the full data set, though less well with analyses involving more homogeneous subsets of the data. In those more homogeneous situations better performance is obtained from appropriate versions of more familiar methods. In all situations the poorest performing choice is the näıve predictor which directly uses the current average to predict the future average. One feature of all the statistical methodologies here is the preliminary use of a new form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables

منابع مشابه

Empirical Bayesball Remixed: Empirical Bayes Methods for Longitudinal Data

Abstract. Empirical Bayes methods for Gaussian and binomial compound decision problems involving longitudinal data are considered. A recent convex optimization reformulation of the nonparametric maximum likelihood estimator of Kiefer and Wolfowitz (1956) is employed to construct nonparametric Bayes rules for compound decisions. The methods are illustrated with an application to predicting baseb...

متن کامل

Dynamic Empirical Bayes Models and Their Applications to Longitudinal Data Analysis and Prediction

Empirical Bayes modeling has a long and celebrated history in statistical theory and applications. After a brief review of the literature, we propose a new dynamic empirical Bayes modeling approach which provides flexible and computationally efficient methods for the analysis and prediction of longitudinal data from many individuals. This dynamic empirical Bayes approach pools the cross-section...

متن کامل

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective

Abstract. Empirical Bayes methods for Gaussian and binomial compound decision problems involving longitudinal data are considered. A new convex optimization formulation of the nonparametric (Kiefer-Wolfowitz) maximum likelihood estimator for mixture models is used to construct nonparametric Bayes rules for compound decisions. The methods are illustrated with some simulation examples as well as ...

متن کامل

Parametric Empirical Bayes Test and Its Application to Selection of Wavelet Threshold

In this article, we propose a new method for selecting level dependent threshold in wavelet shrinkage using the empirical Bayes framework. We employ both Bayesian and frequentist testing hypothesis instead of point estimation method. The best test yields the best prior and hence the more appropriate wavelet thresholds. The standard model functions are used to illustrate the performance of the p...

متن کامل

Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function

In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008